Estimating average wind speed in Thailand using confidence intervals for common mean of several Weibull distributions

The Weibull distribution has been used to analyze data from many fields, including engineering, survival and lifetime analysis, and weather forecasting, particularly wind speed data. It is useful to measure the central tendency of wind speed data in specific locations using statistical parameters for instance the mean to accurately forecast the severity of future catastrophic events. In particular, the common mean of several independent wind speed samples collected from different locations is a useful statistic. To explore wind speed data from several areas in Surat Thani province, a large province in southern Thailand, we constructed estimates of the confidence interval for the common mean of several Weibull distributions using the Bayesian equitailed confidence interval and the highest posterior density interval using the gamma prior. Their performances are compared with those of the generalized confidence interval and the adjusted method of variance estimates recovery based on their coverage probabilities and expected lengths. The results demonstrate that when the common mean is small and the sample size is large, the Bayesian highest posterior density interval performed the best since its coverage probabilities were higher than the nominal confidence level and it provided the shortest expected lengths. Moreover, the generalized confidence interval performed well in some scenarios whereas adjusted method of variance estimates recovery did not. The approaches were used to estimate the common mean of real wind speed datasets from several areas in Surat Thani province, Thailand, fitted to Weibull distributions. These results support the simulation results in that the Bayesian methods performed the best. Hence, the Bayesian highest posterior density interval is the most appropriate method for establishing the confidence interval for the common mean of several Weibull distributions.


INTRODUCTION
Greenhouse gases are produced by both natural processes and human activity, especially the burning of fossil fuels for electricity generation. Greenhouse gases have the ability to absorb infrared radiation, or heat radiation, that radiates off the surface of the earth. When there are large amounts of greenhouse gases, infrared radiation cannot be reflected back outside the atmosphere, causing the increased average global temperature and initiating extreme weather events. In 2017, Thailand ranked 20th in the world's greenhouse gas emissions. It is located in the equatorial region, which is influenced by ocean currents that produce heavy rain and high wind speed during the monsoon season from mid-May to mid-October. These phenomena can be hazardous to both humans and animals, causing catastrophes that affect agricultural productivity, which is an important part of Thailand's economy. The southern region of Thailand is a coastal area that is influenced by the southeast monsoon winds, and Surat Thani is a province on the southeastern coast of Thailand located on a peninsula that juts out into the sea. Thus, monitoring the wind speed to quantify and predict its potential intensity is a useful endeavor. Several distributions are suitable for studying wind speed data, which may differ depending on the month, season, and year. One of these is the Weibull distribution, which has been applied in several studies on analyzing wind speed. Genc et al. (2005) studied wind power potential by estimating two parameters of a Weibull distribution. Dokur & Kurban (2015) used the Weibull distribution to determine the wind energy potential in the Bilecik region and provided estimates of its parameters. Sasujit & Dussadee (2016) used the Weibull distribution to provide an assessment of wind energy and electricity generation in northern Thailand. Islam, Dussadee & Chaichana (2016) used it to estimate the wind power potential on Saint Martin's Island in Bangladesh. La-ongkaew,  applied the coefficient of variation of the Weibull distribution to estimate the dispersion of wind speed data in Thailand. Shu & Jesson (2021) assessed the characteristics of wind speed datasets by using Weibull distributions. As well as assessing wind speed data, the Weibull distribution has been applied in studies in other areas. For illustration, it was utilized to assess the survival time of guinea pigs injected with varying doses of tubercle bacilli. (Bjerkedal et al., 1960), the failure times of air-conditioning systems in two airplanes (Proschan, 1963), the amounts of insurance claims (Hamza & Sabri , 2022), the shelf life of Pezik pickles (Keklik, Isikli & Sur , 2017), and the moisture content of milled rice (Ling, Teng & Lin, 2018).
The mean is a very important statistic for measuring the central tendency of a dataset and has been used in many applications; e.g., the amount of nitrogen-bound bovine serum albumin in mice (Hand et al., 1993;Schaarschmidt, 2013;Sadooghi-Alvandi & Malekzadeh, 2014), the amount of selenium in non-fat milk powder (Philip, Sun & Sinha, 1999), the CD4+ cell counts of HIV patients after initiating anti-vital therapy (Liang, Su & Zou, 2008), and rainfall in Chiang Mai, Thailand (Maneerat, Niwitpong & Niwitpong, 2019). Interval estimation for the mean of a distribution has been investigated by several research groups. Chen & Mi (1996) applied several maximum likelihood estimators for constructing the confidence interval for the mean of an exponential distribution based on grouped data. Colosimo & Ho (1999) estimated the confidence interval for the mean of a Weibull distribution for lifetime analysis based on censored reliability datasets. Peng (2004) provided estimates for the confidence interval for the mean of heavy-tailed distributions. Krishnamoorthy, Lin & Xia (2009) established estimates for the confidence interval for the mean of a Weibull distribution using the generalized variable approach and Wald confidence intervals. Thangjai, Niwitpong & Niwitpong (2020) applied Bayesian methodology to construct estimates of the confidence interval for the mean of a normal distribution with an unknown coefficient of variation. Maneerat, Nakjai & Niwitpong (2022) proposed using Bayesian noninformative priors to estimate the confidence interval for the mean of a three-parameter delta-lognormal distribution. Moreover, functions of the mean such as the difference between and the ratio of two means have also been reported. Lee & Lin (2004) used the generalized confidence interval (GCI) approach to estimate the confidence interval for the ratio of the means of two normal populations. Niwitpong & Niwitpong (2010) proposed estimates for the confidence interval for the difference between the means of two normal populations where the ratio of their variances is known. Niwitpong, Koonprasert & Niwitpong (2012) proposed estimates for the confidence interval for the difference between the means of several normal populations with known coefficients of variation. Thangjai, Niwitpong & Niwitpong (2017) used the GCI and large sample approaches to estimate the confidence interval for the mean and the difference between the means of several normal distributions with unknown coefficients of variation. Maneerat & Niwitpong (2020) compared medical care costs by using Bayesian intervals for the ratio of the means of several delta-lognormal distributions.
Since it is common practice to collect data in different settings, inference based on the common mean of several populations is a useful statistic. Indeed, many researchers have estimated the confidence interval for this scenario. Krishnamoorthy & Lu (2003) used the concept of the generalized p-value to estimate the confidence interval for the common mean of several normal populations. Lin & Lee (2005) proposed a generalized pivotal quantity (GPQ) using the best linear unbiased estimator for estimating the confidence interval of the common mean of several normal populations when the variances are unknown. Later, Ye, Ma & Wang (2010) provided interval estimation for the common mean when the scalar parameters among several inverse Gaussian populations have become unknown. Behboodian & Jafari (2014) used the GCI approach to determine the confidence interval for the common mean of several lognormal populations, while Smithpreecha, Niwitpong & Niwitpong (2018) proposed new methods to calculate the confidence interval for the common mean of several lognormal distributions based on the GCI and adjusted method of variance estimates recovery (MOVER) methods. Lin & Wu (2011) proposed an estimation method based on a higher-order likelihood-based procedure for the confidence interval for the common mean of several inverse Gaussian distributions. Maneerat & Niwitpong (2021) estimated the confidence interval for the common mean of several delta-lognormal populations using the fiducial GCI (FGCI), large sample, MOVER, parametric bootstrap, and highest posterior density (HPD) intervals using the Jeffreys' rule or normal-gamma-beta prior.
In the present study, our goal was to compare the wind speed data from several locations to predict the occurrence of severe wind speed events. Since Surat Thani is a large province on the southeast coast of Thailand, using the common mean of the wind speed datasets from different areas will help in this endeavor, and thereby estimating the confidence interval for the common mean of several Weibull populations becomes important. The advantage of this study is that it will assist provincial authorities in estimating the amount of wind and predicting wind speed in order to monitor the occurrence of severe wind speed. Notwithstanding, the common mean of various Weibull populations has never been investigated. We used Bayesian methodology for the equitailed confidence interval and the HPD interval based on the gamma prior to estimate the confidence interval for the common mean of several Weibull distributions and compare their performances with GCI and adjusted MOVER. Furthermore, we applied these novel methods to assess real wind speed datasets from several locations in Surat Thani province, Thailand. Furthermore, there is no previous study on the implementation of their methodology for measuring the common mean of wind speed data. To fill the gap, novel methods for the confidence interval for the common mean of Weibull distributions were proposed by contemplating the wind speed data concentration measurements. The paper is organized as follows. The parameter of interest of Weibull distribution is introduced, and the details of all proposed methods are described in the section ''Materials & Methods''. Numerical results are reported in the next section. In the application section, wind speed data from Khiri Rat Nikhom, Koh Samui, and Kanchanadit in Surat Thani province, Thailand are used to illustrate. Finally, a discussion and conclusions are provided in the last section.

MATERIALS & METHODS
A flowchart of the research methodology is shown in Fig. 1.
Suppose that X i = (X i1 ,X i2 ,...,X in i ) are random variables from Weibull populations with size n i , scale parameters c i , shape parameters k i , and probability density function (pdf) for i = 1,2,...,p and j = 1,2,...,n i . The cumulative distribution function is defined by The parameters c i and k i were estimated based on the maximum likelihood estimation. The maximum likelihood estimators (MLEs) of the two parameters must always be computed numerically. The MLEk i of k i is solution of the following equation and the MLEĉ i of c i is given bŷ Consider p independent Weibull populations, the means for which can be derived as Thus, the estimator of µ i can be approximated aŝ where is a gamma function used as an extension of the factorial function for nonintegral numbers. For positive number r, the gamma function can be defined as An approximation approach can be applied to determine the variance of an estimator. A delta method is a well-known approach for estimating the variance ofμ i as follows: The formulas for the covariance and variance estimates ofĉ i andk i are calculated by using the Fisher information matrix (see Cohen (1965) for more detailed information) as follows: Likelihood function π (c ,k|x) Posterior density function of c and k π (k|c ,x) Conditional posterior distribution of k π (c |k,x) Conditional posterior distribution of c π (k) Prior distribution of k π (c ) Prior distribution of c According to Graybill & Deal (1959), the estimator for common mean µ is derived by using the weighted average of meanμ i based on p individual samples as follows: whereμ i is defined as in Eq. (6), andv ar(μ i ) is the variance estimate ofμ i , which is defined in Eq. (8). Weerahandi (1993) introduced the GCI based on the concept of GPQ. Let X = (X 1 , X 2 , . . . , X n ) be a random variable from a distribution with probability density function, which depends on a parameter of interest ϕ, and a nuisance parameter γ . And let x = (x 1 ,x 2 ,...,x n ) be the observed value of random variables X . R(X ;x,ϕ,γ ) is called the GPQ if the following two properties hold. These are the distribution of the random quantity R(X ;x,ϕ,γ ) is free of unknown parameters, and the observed value r(X ;x,ϕ,γ ) do not depend on nuisance parameters. Then, if R(X ;x,ϕ,γ ) satisfies the two properties, the quantiles of R form a (1 − α) confidence interval. Now, let R ϕ (α) be the α-th quantile of R(X ;x,ϕ,γ ). Hence, the 100 α-th two-sided GCIs for the parameter of interest is

GENERALIZED CONFIDENCE INTERVAL
Letĉ i0 andk i0 be the observed values ofĉ i andk i based on a sample of size n i from Weibull(c i ,k i ). Using the results from Thoman, Bain & Antle (1969), the distributions ofk i k i andk i ln ĉ i c i do not depend on c and k. Consequently, we see thatk i k i ∼k * i and k i ln ĉ i c i ∼k * i ln(ĉ * i ), whereĉ * i andk * i are the MLEs based on a sample of size n i from Weibull(1,1). The GPQs of shape and scale parameters from Weibull distributions were given in Krishnamoorthy, Mukherjee & Guo (2007). and The GPQ for estimating µ based on the i − th sample is determined as The GPQ for the common mean is a weighted average of the GPQ R µ i based on p individual sample as where Rv ar(μ i ) is a GPQ ofv ar(μ i ). As a result, the 100 (1−α)% two-sided confidence interval for the common mean using GCI is where R µ (α/2) is the 100 α/2-th percentile of R µ .
The following algorithm is used to construct L GCI .µ and U GCI .µ .

Algorithm 1
For g = 1 to m, where m is the number of generalized computation

ADJUSTED METHOD OF VARIANCE ESTIMATES RECOVERY
Based on MOVER originally introduced by Donner & Zou (2010), we used it with the large sample method to estimate adjusted MOVER. Again, the pooled estimator of the common mean can be defined as in Eq. (9).
Consider two parameters of interest µ 1 and µ 2 withμ 1 andμ 2 as their respective estimators. Assuming thatμ 1 andμ 2 are independent, then lower limit L and the upper limit U forμ 1 +μ 2 can be defined as where z α/2 is the 100 (α/2) − th percentile of the standard normal distribution. By using the central limit theorem, the variance estimates forμ i at µ i = l i ,i =1 ,2 are given bŷ and where l 1 and l 2 are the lower limits of µ 1 and µ 2 , respectively. Furthermore, the variance estimates forμ i at µ i = u i ,i =1 ,2 are given bŷ and where u 1 and u 2 are the upper limits of µ 1 and µ 2 , respectively.
Based on Eqs. (15)-(19), the 100(1 − α)% confidence limit forμ 1 +μ 2 can be written as and For p independent samples to which adjusted MOVER is applied, lower limit L and upper limit U for the sum of µ i can be written as The variance estimates ofμ i at µ i = l i and µ i = u i , where i = 1,2,...,p are given bŷ and In the present study, the lower and upper limits ofμ i are applied based on the Wald confidence interval as follows: When using the large sample concept to perform interval estimation for µ, the variance estimate ofμ i can be defined aŝ Therefore, the 100(1 − α)% two-sided confidence interval for the common mean using the Adjusted MOVER with the Wald confidence interval becomes and whereμ i is defined as in Eq. (6).
The following algorithm is used to construct L AM .µ and U AM .µ .

BAYESIAN CONFIDENCE INTERVAL
Bayesian methodology is based on Baye's theorem for updating the probability based on prior knowledge. The posterior probability is first obtained by using a prior probability distribution and a likelihood function. Here, Bayesian methods for establishing the confidence interval for the common mean of several Weibull distributions are presented. Assume X is a random variable with a Weibull distribution. If c = 1 c k , then the pdf can be expressed as A Bayesian confidence interval estimate is constructed based on the posterior distribution, a conditional distribution derived from the observed sample data that is used to gain information about the parameter, which is regarded as a random quantity. This is achieved in accordance with the relationship posterior distribution ∝ prior distribution × likelihood function. Hence, we have to provide a suitable prior distribution and likelihood function. In this study, we assumed that the shape and scale parameters follow the gamma prior distribution; i.e., and where v 1 ,z 1 ,v 2 ,z 2 are the hyperparameters. As a consequence, the joint posterior density function of c as well as k given x can indeed be printed as and the likelihood function L(c ,k|x) is given by For Weibull distribution, π (c ,k|x) cannot be obtained in close form, we used a Gibbs sampling procedure, the Markov chain Monte Carlo (MCMC) method introduced by Geman & Geman (1984), to generate a sample from the posterior density function. The MCMC method is widely used for Bayesian computation in complex statistical models. It generates the samples by rolling a properly constructed Markov chain for an extended period of time. The Gibbs sampler requires samples from fully conditional distributions, which is computationally intensive. The respective conditional posterior distributions of the shape and scale parameters are and π(c |k,x) ∼ gamma(n + v 2 ,z 2 + x k ).
Although we used Gibbs' sampling directly for the conditional posterior distribution of the scale parameter, the conditional posterior distribution of the shape parameter does not have a closed form, so Gibbs' sampling could not be applied in a straightforward manner. Therefore, the Random Walk Metropolis (RWM) algorithm was utilized to generate random samples from an unknown distribution. Similar to acceptance-rejection sampling, the algorithm requires that the applied value has an acceptable probability for each iteration of the algorithm to ensure that the Markov chain converges for the goal density (Saraiva & Suzuki, 2017). To use the RWM algorithm to update the shape parameter, the updated value is approved with probability min(1,A k ), where A k is defined by where c (t ) and k (t ) ,t =1 ,2,...,T are the Bayesian estimators of c and k based on Gibbs' sampling, respectively. Subsequently, we used the following algorithms to generate the samples and compute the Bayesian estimates.
Algorithm 3 The Gibbs algorithm 1. Consider the initial state of parameter (c (0) ,k (0) ). For t = 1 to T , where T is the number of iterations using MCMC by Gibbs sampling 2. Generate c (t ) ∼ gamma(n + v 2 ,z 2 + x k (t −1) ) 3. Update k (t ) using RWM algorithm End t loop 4. Discard the first 1,000 samples Algorithm 4 RWM 1. The initial state of (c (t ) ,k (t −1) ) 2. Generate ε from Normal distribution with parameter (0,σ 2 k ) 3. Calculatek = k (t −1) + ε 4. Calculate A k as given in Eq. (37) 5. Generate u from Uniform distribution with parameter (0,1) 6. Set k (t ) =k, if u ≤ min(1,A k ), else set k (t ) = k (t −1) Again, let X i = (X i1 , X i2 , . . . , X in i ) be a random sample from Weibull distribution with size n i , scale parameter c i and shape parameter k i . The pooled estimator for the common mean based on the Bayesian method iŝ ,i = 1,2,...,p,t = 1,2,...,T , wherev ar(μ (t ) i ) are the variance estimates ofμ (t ) i , which is obtained from Eq. (8). After computing the Bayesian estimates by following Algorithms 3 and 4, and Eq. (38), the confidence interval for µ can be constructed. Therefore, the 100 (1 − α)% two-sided confidence interval for the common mean using the Bayesian method is given by where L B.µ and U B.µ are the lower bound and upper bound of the 100 (1 − α)% equitailed confidence interval and the HPD interval of µ, respectively. The HDInterval package in the R programming suite was used to compute the HPD interval. The assumption for HPD is that all the values inside the HPD interval have a higher probability density than any outside of it, and thus include the most credible value (Kruschke, 2015). In addition, it gives the narrowest length of the interval in the domain of the posterior probability distribution.
The following algorithm is used to construct L B.µ and U B.µ .

RESULTS
A simulation study was conducted using the R statistical package. The coverage probabilities and expected lengths of the confidence interval methods were used to evaluate their performance. The data were generated from several independent Weibull distributions denoted as Weibull(c i ,k i ) where k i = 2 and c i = µ/ 1 + 1 k i , for i = 1,2,...,p; common mean µ = 0.5,1,5, or 10; the number of samples p = 2,4, or 6; and sample sizes n i for which are provided in Tables 1-3. For each set of parameters, we conducted 5,000 simulation runs, 2,500 pivotal quantities for GCI, and 20,000 MCMC realizations using Gibbs sampling with a burn-in of 1,000 for the Bayesian methods. The method with a coverage probability above the nominal confidence level of 0.95 and the shortest expected length was considered the best-performing one for each scenario. The simulation results for p = 2,4, and 6 are reported in Tables 1-3, respectively. Figure 2 shows the algorithm utilized to help estimate the coverage probabilities and expected lengths of the methods.
For p = 2, the coverage probabilities calculated using the GCI method were larger than or close to the nominal confidence level for all sample sizes. The Bayesian two-tailed credible interval method was satisfactory in most cases while the Bayesian HPD method only performed well when µ = 0.5 or 1 for large sample sizes. Nevertheless, those using the adjusted MOVER method did not meet the goal in any situation. For p = 4, the coverage probabilities using the GCI method were slightly smaller than 0.95 but performed better with larger sample sizes whereas adjusted MOVER still performed badly. Meanwhile, the Bayesian methods had coverage probabilities higher than 0.95 only when µ = 0.5 for large sample sizes. Moreover, similar results were obtained for p = 6. Finally, the coverage probabilities and expected lengths of the proposed methods for circumstances with varying sample cases, sample sizes, and common mean, are summarized in Figs. 3-5, respectively. Notes. 10 6 stands for (10,10,10,10,10,10). Bold values denote the coverage probability higher than the nominal confidence level and the shortest expected length.

APPLICATION OF THE METHODS TO ESTIMATE WIND SPEED DATA FROM VARIOUS AREAS OF SURAT THANI
Surat Thani is the largest province in southern Thailand and is located on the west coast of the Gulf of Thailand. Ten years of monthly wind speed data were obtained from weather stations in three districts: Khiri Rat Nikhom, Koh Samui, and Kanchanadit  (Table 4). The data summary statistics are displayed in Table 5. First, we used the Akaike Information Criterion (AIC) to check whether the Weibull distribution was appropriate for these datasets, with the results in Table 6 showing that this was indeed the case with the smallest AIC value. Moreover, Fig. 6 exhibits Q-Q plots of the datasets showing that the Weibull distribution is definitely appropriate, and confirming with P-value of these areas are 0.3708, 0.4826, and 0.5681, respectively. The estimated common mean of the datasets is 0.8869. The 95% interval estimation results for the common mean computed by using all four methods are summarized in Table 7. Furthermore, a trace plot of the generated µ values is shown in Fig. 7. It can be seen that these findings confirm the simulation results for a large sample size. Adjusted MOVER provided the shortest expected length, while that of the Bayesian HPD interval was smaller than those of GCI and the Bayesian equitailed confidence interval. However, once again adjusted MOVER yielded a coverage probability that was lower than the other methods and did not reach the target. Meanwhile, both Bayesian methods yielded coverage probabilities higher than the target and the expected length of the Bayesian HPD interval was slightly narrower than that of the Bayesian equitailed confidence interval. Therefore, the Bayesian HPD interval is the most suitable for estimating the confidence interval for the common mean of several Weibull distributions for large sample sizes.

DISCUSSION
La-ongkaew,  proposed Bayesian methods using the gamma prior for estimating the difference between the parameter values of several Weibull distributions and applied them to wind speed data measured at wind energy stations in Thailand. We extended this idea to construct estimates for the confidence interval for the common mean of several Weibull distributions using GCI, adjusted MOVER, and the Bayesian equitailed confidence interval and HPD interval both based on the gamma prior.